26 research outputs found

    Pulse Coded Neural Network Implementation In VLSI

    Get PDF
    A neural network that encodes signals in terms of pulses has been designed and fabricated. The neural network components are described in detail. As a test case, a two-layer network is implemented. A preliminary test result shows some promise and some limitations of the desig

    Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions

    Get PDF
    In recent years, many applications have been implemented in embedded systems and mobile Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry Convolutional Neural Network (CNN) in this class of devices, many research groups have developed specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm called Stochastic Computing (SC) can implement CNN with low hardware footprint and power consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding error. Experimental results show our proposed solution provides significant savings in hardware footprint and increased accuracy for the SC CNN basic functions circuits compared to previous work

    FPGA-based real-time moving target detection system for unmanned aerial vehicle application

    Get PDF
    Moving target detection is the most common task for Unmanned Aerial Vehicle (UAV) to find and track object of interest from a bird's eye view in mobile aerial surveillance for civilian applications such as search and rescue operation. The complex detection algorithm can be implemented in a real-time embedded system using Field Programmable Gate Array (FPGA). This paper presents the development of real-time moving target detection System-on-Chip (SoC) using FPGA for deployment on a UAV. The detection algorithm utilizes area-based image registration technique which includes motion estimation and object segmentation processes. The moving target detection system has been prototyped on a low-cost Terasic DE2-115 board mounted with TRDB-D5M camera. The system consists of Nios II processor and stream-oriented dedicated hardware accelerators running at 100 MHz clock rate, achieving 30-frame per second processing speed for 640 × 480 pixels' resolution greyscale videos

    Simultaneous routing and buffer insertion algorithm for minimizing interconnect delay in VLSI layout design

    Get PDF
    In deep submicron fabrication technology, transistors can now switch much faster, but wire resistances are now larger, and delay due to wires can exceed gate delay. Consequently, the interconnect delay is the dominant factor in the construction of wire routing in very large scale integrated (VLSI) circuits, which today, has feature dimensions in the nanometer range. Today, the state-of-the-art circuit design involves as much the engineering of the wires as the design of transistors. Hence, a successful VLSI design today depends heavily on a successful interconnect design. An effective approach for reducing the interconnect delay is buffer insertion (van Ginneken, 1990). In this method, a wire is divided into segments with a buffer inserted between the segments (Cong et al., 1996). Traditionally, buffer insertion is a post-layout optimization technique, implying that the routing paths are first found, and then buffers are inserted in these paths. However, today?s VLSI designs typically apply some form of design reuse utilizing pre-designed cells, or macro blocks

    FPGA Implementation of RSA Public-Key Cryptographic Coprocessor

    Get PDF
    The hardware implementation of the RSA algorithm for public-key cryptography is presented. The algorithm is dependent on the computation of modular exponentials. Critical to this computation is a fast implementation of modular multiplications. A high-performance systolic array architecture for modular multiplication based on the algorithm of Montgomery (1985) is proposed. The design is targeted for implementation in reconfigurable logic, which can yield custom-hardware performance yet maintains all the flexibility of software-based systems. Reconfigurable computing allows the designer to respond, in the prototyping stage, to flaws discovered in implementation or to changes in standards or data formats. We report the issues involved in the preliminary design of the prototype to be fabricated in Altera FLEX10KE series FPGA mounted on a PCI car

    An optimization algorithm based on grid-graphs for minimizing interconnect delay in VLSI layout design

    Get PDF
    In this paper, we describe a routing optimization algorithm based on grid-graphs for application in a deep-submicron VLSI layout design. The proposed algorithm, named S-RABILA (for Simultaneous Routing and Buffer Insertion with Look-Ahead), constructs a maze routing path, simultaneously with buffer insertion and wire sizing, taking into account wire and buffer obstacles, such that the interconnect delay from source to sink is minimized. In current nanometer VLSI layout design, the interconnect delay has become the dominant factor affecting system performance. Research has shown that routing algorithms, which include simultaneous buffer insertion and wire-sizing, have been proven to be very effective in solving the timing optimization problem in VLSI interconnect design. A key contribution of this work is a novel look-ahead scheme applied to speed up the runtime of the algorithm, and aids in finding the exact solution. Hence, the algorithm is accurate, fast, scalable with problem size, and can handle large routing graphs. Experimental results show the effectiveness of the look-ahead scheme and indicate that S-RABILA provides significant performance improvements over similar existing VLSI routing algorithms

    Iterative RLC models for interconnect delay optimization in VLSI routing algorithms

    Get PDF
    Buffer insertion (van Ginneken, 1990), and wire-sizing techniques (Lillis, Cheng and Lin, 1996) have been widely used to minimize global interconnect delay path between interconnect source and sink points. These techniques rely on delay models (Pileggi, 1995) to estimate buffer insertion points – from simple first order linear model (Elmore, 1948) to more complex moment matching techniques (Ismail, Friedman and Neves, 1999a). Thus, interconnect analysis and modeling is of paramount importance in realizing a successful global interconnect routing. For effective buffer insertion point estimation, both source-to-sink and sink-tosource delay estimation may be used (Shaikh-Husin and Khalil- Hani, 2007). As VLSI fabrication technology scales to smaller feature sizes and larger layout areas, global interconnect delay increasingly dominates device delay (Bakoglu, 1990)
    corecore